Workflow sharing with automated metadata validation and test execution to improve the reusability of published workflows

Abstract Background Many open-source workflow systems have made bioinformatics data analysis procedures portable. Sharing these workflows provides researchers easy access to high-quality analysis methods without the requirement of computational expertise. However, published workflows are not always guaranteed to be reliably reusable. Therefore, a system is needed to lower the cost of sharing workflows in a reusable form. Results We introduce Yevis, a system to build a workflow registry that automatically validates and tests workflows to be published. The validation and test are based on the requirements we defined for a workflow being reusable with confidence. Yevis runs on GitHub and Zenodo and allows workflow hosting without the need of dedicated computing resources. A Yevis registry accepts workflow registration via a GitHub pull request, followed by an automatic validation and test process for the submitted workflow. As a proof of concept, we built a registry using Yevis to host workflows from a community to demonstrate how a workflow can be shared while fulfilling the defined requirements. Conclusions Yevis helps in the building of a workflow registry to share reusable workflows without requiring extensive human resources. By following Yevis’s workflow-sharing procedure, one can operate a registry while satisfying the reusable workflow criteria. This system is particularly useful to individuals or communities that want to share workflows but lacks the specific technical expertise to build and maintain a workflow registry from scratch.


Experimental design and statistics
Results: We introduce Yevis, a system to build a workflow registry that automatically validates and tests workflows to be published. The validation and test are based on the requirements we defined for a workflow being reusable with confidence. Yevis runs on GitHub and Zenodo and allows workflow hosting without the need of dedicated computing resources. A Yevis registry accepts workflow registration via a GitHub pull request, followed by an automatic validation and test process for the submitted workflow. As a proof of concept, we built a registry using Yevis to host workflows from a community to demonstrate how a workflow can be shared while fulfilling the defined requirements.
Conclusions: Yevis helps in the building of a workflow registry to share reusable workflows without requiring extensive human resources. By following Yevis's workflow-sharing procedure, one can operate a registry while satisfying the reusable workflow criteria. This system is particularly useful to individuals or communities that want to share workflows but lacks the specific technical expertise to build and maintain a workflow registry from scratch.

Workflow | Workflow language | Continuous integration | Open science | Reproducibility | Reusability
Correspondence: t.ohta@dbcls.rois.ac.jp Background Due to the low cost and high throughput of measurement instruments that acquire digital data from biological samples, the volume of readily available data has become enormous (1). To obtain scientific knowledge from large datasets, a number of computational data analysis processes are required, for example, in DNA sequencing, sequence read trimming, alignment with reference genomes, and annotation using public databases (2). Researchers have developed analysis tools for each process and often publish them as opensource software (3). To avoid the need to execute these tools manually, researchers usually write a script to combine them into what is called a workflow (4).
To build and maintain a complex workflow that combines many tools efficiently (5), many workflow systems have been developed (6). Some of these systems have large user communities, such as Galaxy (7), the Common Workflow Language (CWL) (8), the Workflow Description Language (WDL) (9), Nextflow (10), and Snakemake (11). Although each system has its unique characteristics, they have a common aim: to make computational methods portable, maintainable, reproducible, and shareable (4). Most systems have a syntax for describing a workflow that is part of what is called a workflow language. They also have an execution system that works with computational frameworks, such as a job scheduler and container virtualization (12). With the popularization of workflow systems, many research communities have worked on workflow sharing in the form of a workflow language. Workflow registries, such as Work-flowHub (13), Dockstore (14), and nf-core (15), have been developed as public repositories for the sharing of workflows. Workflow execution systems also utilize these registries as their tool libraries. To improve the interoperability of workflow registries, the Global Alliance for Genomics Health (GA4GH) proposed the Tool Registry Service (TRS) specification that provides a standard protocol for sharing workflows (16,17). Sharing workflows not only increases the transparency of research but also helps researchers by facilitating the reuse of programs, thereby making data analysis procedures more efficient. However, workflows that are accessible on the internet are not always straightforward for others to use. If a published workflow is not appropriately licensed, researchers cannot use it because the permission for secondary use is unclear. A workflow may also not be executable because its format is incorrect, or dependent files cannot be found. Even if a workflow can be executed, the correctness of its operation often cannot be verified because no tests have been provided. Furthermore, the contact details of the person responsible for the published workflow are not always attached to it. It is noteworthy that these issues in reusing public workflows are not often obvious to workflow developers. To clarify the requirements for workflow sharing, Goble et al. have proposed the concept of a FAIR (findable, accessible, interoperable, and reusable) workflow (18). This inheritance of the FAIR principles (19) focuses on the structure, forms, ver-D R A F T sioning, executability, and reuse of workflows. However, the question remains as to who should guarantee to users that published workflows can be reused following the FAIR workflow guidelines. WorkflowHub asks submitters to take responsibility for workflows: when a workflow is registered on WorkflowHub, the license and author identity should be clearly stated, encouraging them to publish FAIR workflows. However, there is no obligation as to the correctness of the workflow syntax, its executability, or testing. Not placing too many responsibilities on workflow submitters keeps obstacles to submission low, which will likely increase the diversity of public workflows on WorkflowHub; however, it will also likely increase the number of one-off submissions, which one can assume are at higher risk for the workflow problems previously described. Unlike WorkflowHub, in nf-core, the community that operates the registry holds more accountability for published workflows. Workflow submitters are required to join the nfcore community, develop workflows according to their guidelines, and prepare them for testing. These requirements enable nf-core to collect workflows with remarkable reliability. However, the community's effort tends to focus on maintaining more generic workflows that have a large number of potential users. Consequently, nf-core states that it does not accept submissions of bespoke workflows. This is an understandable policy, as maintaining a workflow requires domain knowledge of its content, and this is difficult to maintain in the absence of the person who developed the workflow. In order to improve research efficiency through workflow sharing, research communities need the publication of diverse workflows in a reusable form. However, as shown by existing workflow registries, there is a trade-off between publishing a wide variety of workflows and maintaining the reusability of the workflows that are published. Solving this issue requires reducing the cost to developers in making and keeping their workflows reusable, which currently relies on manual effort. This is achievable by redefining the FAIR workflow concept as a set of technical requirements and providing a system that automates their validation and testing. We introduce Yevis, a system to share workflows with automated metadata validation and test execution. Through the development of Yevis, we specified a set of technical requirements that define a reusable workflow, according to the FAIR workflow concept. Yevis helps researchers and communities share workflows that satisfy the requirements by supporting a build of an independent workflow registry. To allow workflow hosting without the need of additional dedicated computing resources, Yevis works on two public web services: GitHub, a source code sharing and software development service, and Zenodo, a public research data repository. In addition, a Yevis registry provides a web-based workflow browser and the GA4GH TRS-compatible API ensures interoperability with other existing workflow registries. Yevis is particularly powerful when individuals or communities want to share workflows but are without the technical expertise to build and maintain a web application. To demonstrate that workflows can be shared that fulfill the defined requirements using Yevis, we built a registry for workflows that an existing community has managed.

Implementations
System design. First, we defined a set of requirements that the Yevis system can automatically verify or test (Table 1). By satisfying these requirements inspired by FAIR workflow, we consider a workflow is "reusable with confidence." These criteria have three aspects: workflow availability, accessibility, and traceability. To help researchers share reusable workflows, we took an approach to aid them in building their own workflow registry that automatically ensures its reusability. We define a workflow registry as a service that serves workflow information via the GA4GH TRS protocol. The information provided by the TRS API is various workflow metadata, such as author information, documentation, language type and version, dependent materials, testing materials, etc. Large files, such as dependent materials and testing materials, are not directly included in the TRS API response but are described as remote locations, such as HTTP protocol URLs. Therefore, the entities that a workflow registry collects are a set of workflow metadata described in the form of the TRS API response. In this study, therefore, we designed the system as an API server that delivers the TRS API response. In the Yevis registry, a workflow-sharing procedure is divided into three processes: submission, review, and publication ( Figure 1). To address the requirements listed in Table 1, the Yevis system automatically performs processes, such as metadata validation, workflow testing, test provenance generation, persisting associated files, DOI assignment, and TRS response deployment. To generate the TRS API response and publish it while addressing the requirements listed in Table 1, we implemented a command-line application called Yevis-cli. This application contains various utilities to support the workflow registration procedure including validation and testing. As a service and infrastructure to perform these steps, we designed Yevis to use the services of GitHub and Zenodo. Using these web services makes it possible for communities to build a workflow registry without the need of maintaining their own computer servers.
Workflow registration with automated validation and testing. To set up a Yevis registry, registry maintainers need to do an initial configuration of GitHub and Zenodo; this involves, for example, creating a GitHub repository, changing repository settings, and setting up security credentials. The online documentation "Yevis: Getting Started" shows the step-by-step procedures to deploy a workflow registry and test it (20). We defined the Yevis metadata file, a JSON or YAML format file that contains structured workflow metadata ( Figure 2). Yevis-cli uses this file as its input and output in the submission process. The Yevis metadata file will be published on the registry along with the TRS response to provide metadata that D R A F T Table 1. The requirements for a workflow to be considered reusable with confidence. We classify these requirements from the perspectives of the availability, validity, and traceability of the workflows. We propose that these requirements should be assured and provided to users by the workflow registries.

Perspective Requirement Description
Availability Main workflow description The main workflow description file is available and accessible without restriction.

Dependent materials
The dependencies of the main workflow are available, e.g., definitions of dependent workflows and tools.

Testing materials
The job configuration files for testing are available, e.g., parameter and input files. Open-source license The workflow and the related materials are published under an appropriate open-source license.

Validity
Language type The language used to describe the workflow is specified, e.g., CWL, WDL, or Nextflow.

Language version
The version of the workflow language used is specified.

Language syntax
The language syntax of the workflow is valid.

Traceability
Authors and maintainers The contact information of the authors and the maintainers is identified. Documentation The documentation of the workflow is available.

Workflow ID
The unique identifier to specify the workflow is assigned, ideally by a URI. Workflow metadata version The version number of the workflow metadata is specified.  The flowchart of the workflow registration to a Yevis repository. The workflow registration procedure is divided into three processes: the submission, review, and publication process. Each process is performed in different locations: in the submitter's local environment, as part of the GitHub pull request, or as the GitHub Actions. The generated TRS API response is deployed to GitHub Pages. The steps indicated by yellow boxes, such as validating metadata, are performed automatically using Yevis-cli.
is not included in the TRS protocol, such as an open-source license.
Submission process. Figure 3 shows the submission process using Yevis-cli. During this submission process, the workflow submitter describes the workflow metadata in their local environment and submits it through a GitHub pull request (i.e., a review request to the registry maintainer). First, Yeviscli generates a template for the Yevis metadata file, which requires the URL of the main workflow description file as an argument. In many workflow systems, the main workflow description file is the entry point for workflow execution. Yevis-cli generates a template supplemented with workflow metadata automatically collected by using the GitHub REST API and inspecting the workflow's contents. Next, the submitter needs to edit the Yevis metadata file template and add workflow tests. Yevis-cli executes a test using a GA4GH Workflow Execution Service (WES) instance, a type of web service also described as workflow as a service (17,21); therefore, the testing materials must be written along with the specification of the WES run request. Yevis-cli performs these tests to check if the workflow execution completes successfully. After preparing the Yevis metadata file, Yevis-cli validates the workflow metadata syntax and runs tests using WES in the submitter's local environment. If no WES endpoint is specified, the tests are run using Sapporo (22), a production-ready implementation of WES, and Docker (23), a container virtualization environment. Using these portable WES environments also ensures the portability of testing in Yevis. Finally, Yevis-cli submits the workflow as a GitHub pull request, once it confirms the required actions: the metadata validation and the test passing. This restriction reduces the burden on the registry maintainer because many of the requirements listed in Table 1 can be ensured during the submission process rather than the review process.  Example of the Yevis metadata file. The main workflow description, dependent materials, etc. are described as remote locations; the file contains all the information that the Yevis-cli requires to automate the whole process. This is the actual metadata file for the workflow described in the Section "Sharing workflows using Yevis." This file is automatically updated to the version shown in Figure 7 through the processes within Yevis; for example, the file URL field is replaced by the Zenodo record URL that persists in the associated workflow files.  Fig. 3. The timeline of the workflow submission process using Yevis-cli. The submitter executes four subcommands of Yevis-cli: "make-template," "validate," "test," and "pull-request" in its local environment. The submitter needs to edit a template of the Yevis metadata file using any text editor. The workflow and its metadata need to pass validation and testing before their submission, which helps to reduce the burden on the registry maintainer.

D R A F T
Review process. Figure 4 shows the workflow review process using Yevis-cli. During the review process, registry maintainers examine each workflow submitted as a Yevis metadata file on the GitHub pull request UI. Because the submission method is restricted to Yevis-cli, the submitted workflow is guaranteed to pass validation and testing. To ensure the reproducibility of test results on a local computer, Yevis automatically validates and tests it on GitHub Actions, GitHub's continuous integration/continuous delivery (CI/CD) environment (24). After automated validation and testing, the maintainers review the test results and log files to consider whether to approve the pull request. Rather than using a chat tool or a mailing list, the review process through the GitHub pull request improves the transparency and traceability of workflow publication.
Publication process. Figure 5 shows the workflow submission process using Yevis-cli. During the publication process, the system automatically persists all files associated with the workflow. It generates the TRS response from the Yevis metadata file. The approval of the pull request automatically triggers the publication process on GitHub Actions. In the GitHub Actions script, Yevis-cli uses the Zenodo API to create a new Zenodo upload and persists all files related to the workflow (25). It obtains the DOI and persistent URLs of workflows from Zenodo, and appends them to the Yevis metadata file. Following the Zenodo upload, the Yevis-cli in the GitHub action generates a TRS response JSON file and is deployed to GitHub Pages, GitHub's static web page hosting service. Accordingly, the Yevis metadata file is merged to the default branch of the GitHub repository and deployed to GitHub Pages. With these two files, the TRS response JSON file and the Yevis metadata file, a Yevis registry covers the information that fulfills the requirements of a reusable workflow.

Workflow browsing interface.
To make it easier for registry maintainers and users to browse workflows, we implemented Yevis-web, a workflow browsing interface ( Figure 6). As the interface is a browser-based application implemented in JavaScript, registry maintainers can deploy the browser on GitHub Pages. Yevis-web accesses the TRS API served via GitHub Pages and the GitHub REST API to retrieve workflow information. To help organize the submissions to the registry, the browser shows workflows of both statuses, those already published and those still under the review process.

Proof of concept
To demonstrate that a research community can publish the workflows using Yevis while addressing the requirements listed in Table 1, we built a workflow registry that publishes "DAT2-cwl" workflows with the Yevis system (26). These workflows written in CWL are the appendix of the book Next Generation Sequencer DRY Analysis Manual, 2nd Edition (27) and are maintained by the book's authors and communities. These workflows have been maintained by a community of bioinformatics experts; however, they fulfill only a part of the requirements that we defined. For example, the workflows have test data but would require continuous testing. They also lack workflow metadata in a standard format. Among the DAT2-cwl workflows, we selected a bacterial genome analysis workflow in building a new registry with Yevis (28). This workflow combines the following command line tools: SeqKit (29), FastQC (30), fastp (31), and Platanusb (32). Each tool used in the workflow is packaged in a Docker container. First, we described a Yevis metadata file ( Figure 2) for this workflow using Yevis-cli and appended a test of the workflow in the form of a WES run request. We then performed the workflow registration procedure described in the Section "Workflow registration with automated validation and testing" using Yevis-cli that enable the automation of many of the steps in the validation, testing, reviewing, and publishing. Through the publication procedure of the bacteria genome analysis workflow, we evaluated how the Yevis system addressed the requirements listed in Table 1. Requirements classified as "Availability" were addressed by being uploaded to Zenodo under an appropriate open source license (33). The Yevis metadata file ( Figure 7) (34) and TRS API response ( Figure 8) were updated through Yevis's publication process to use URLs persisted by Zenodo. Requirements classified as "Validity" were addressed by running tests on GitHub Actions. The contents in the Yevis metadata file and the TRS response satisfy the validity requirements, such as workflow type, workflow language version, and the URL of the test results. Requirements classified as "Traceability" were addressed by describing, reviewing, and publishing them in the Yevis metadata file and TRS API response. From the above, we confirmed that Yevis successfully published the bacteria genome analysis workflow while addressing the defined requirements.

Discussion
Through our survey of existing workflow registries, such as Dockstore, WorkflowHub, and nf-core, it was revealed that they are maintained based on numerous contributions by various communities and the use of sufficient computer resources. While these established workflow registries accept submissions and are available for use by researchers, there are still cases in which there is a need to create a new workflow publication platform. For example, in the case of the Bioinformation and DDBJ center, the institute (hereafter referred to simply as DDBJ) needed to have a collection of workflows that would be allowed to run on the WES on their computing platform. Therefore, we designed Yevis as a tool to help workflow developers create a registry to share their workflows. DDBJ used Yevis to create and then to maintain a workflow registry dedicated to workflows for use on the DDBJ WES (35). Yevis can promote the concept of a distributed workflow registry model that underlies the specifications of the GA4GH Cloud Work Stream (17). The API standard for workflow registry specified by GA4GH enables a decentralized model, which promotes diversity in workflow development and in

Publication Timeline Processes
GitHub Pages TRS API response Zenodo Obtain DOI Upload files Updated Yevis metadata file Deploy Fig. 5. The timeline of the workflow publication process using Yevis-cli. All steps are performed automatically on GitHub Actions. All files related to the workflow are persisted by uploading them to Zenodo. The DOI is generated by Zenodo, and the Yevis metadata file is updated to append the DOI information and the persisted file URL. The GitHub action generates a TRS response from the Yevis metadata; it then deploys both of them to GitHub Pages.
the research of analysis methods. Resource sharing, particularly of analysis methods, has a bigger impact on a community studying a minor target with limited human resources. The Yevis system strongly relies on web services, such as GitHub and Zenodo. This is because we aimed to provide support to individuals or communities without sufficient computing resources, but this may result in a lock-in to these web services. Although we believe these services are reliable enough to host valuable workflows, we designed the system to only use substitutable operations of those services, such as version management, file hosting, and continuous script execution. Therefore, we are able to technically build a Yevis registry in an on-premise environment. Compared to existing workflow registries that have a web form for workflow registration, the Yevis system provides only a command-line interface, Yevis-cli, as a method to submit a workflow. This is because we prefer to test workflows locally in advance of submission, while the existing registries test as part of a review process. By using the same test suite on both the submitter's environment (local) and as part of the registry's automatic process (remote), Yevis-cli ensures better reliability of the test results. This also helps to reduce the cost to a registry maintainer by ensuring a workflow is at least runnable on the submitter's local environment.
The Yevis system is a great solution for research communities that aim to share their workflows and wish to establish their own registry as described. However, we recognize it still has some limitations. One of the challenges is the description of D R A F T Fig. 6. Screenshot of Yevis-web. Yevis-web is a browser-based application used via a web browser, which is deployed by workflow registry maintainers and communicates with the TRS API and GitHub REST API to retrieve workflow information. The browser shows both published and under-review workflows to help maintainers in organizing the registry. Upon selecting a workflow of interest, Yevis-web displays more detailed information, such as test results and the contents of the files related to the workflow.  workflow testing. Writing tests for a workflow is difficult because the outputs may be heuristic values, may differ among tool versions, and the total amount of input and output data can be enormous. There is a need for a testing framework that packages test cases and results and generates assertions that allow for slight differences in the output. Debugging a workflow is also difficult because a typical workflow uses many tools that use various programming runtimes. There-

Availability of supporting data and materials
Data and materials related to the DAT2-cwl workflows described in the Section "Sharing workflows using Yevis" are available on GitHub and Zenodo as follows: • GitHub repository for DAT2-cwl workflows (26) • DAT2-cwl/bacteria-genome workflow (28) • Workflow registry yevis-DAT2-cwl (36) • Yevis metadata file for the DAT2-cwl/bacteria-genome workflow (34) • DAT2-cwl/bacteria-genome workflow files uploaded to Zenodo by Yevis (33) • Screenshots and logs of the DAT2-cwl/bacteriagenome workflow review process and publication process (37) • Workflow browser for yevis-DAT2-cwl (38)   Background: Many open-source workflow systems have made bioinformatics data analysis procedures portable. Sharing these workflows provides researchers easy access to high-quality analysis methods without the requirement of computational expertise. However, published workflows are not always guaranteed to be reliably reusable. Therefore, a system is needed to lower the cost of sharing workflows in a reusable form.

D R A F T
Results: We introduce Yevis, a system to build a workflow registry that automatically validates and tests workflows to be published. The validation and test are based on the requirements we defined for a workflow being reusable with confidence. Yevis runs on GitHub and Zenodo and allows workflow hosting without the need of dedicated computing resources. A Yevis registry accepts workflow registration via a GitHub pull request, followed by an automatic validation and test process for the submitted workflow. As a proof of concept, we built a registry using Yevis to host workflows from a community to demonstrate how a workflow can be shared while fulfilling the defined requirements.
Conclusions: Yevis helps in the building of a workflow registry to share reusable workflows without requiring extensive human resources. By following Yevis's workflow-sharing procedure, one can operate a registry while satisfying the reusable workflow criteria. This system is particularly useful to individuals or communities that want to share workflows but lacks the specific technical expertise to build and maintain a workflow registry from scratch.

Background
Due to the low cost and high throughput of measurement instruments that acquire digital data from biological samples, the volume of readily available data has become enormous (1). To obtain scientific knowledge from large datasets, a number of computational data analysis processes are required, for example, in DNA sequencing, sequence read trimming, alignment with reference genomes, and annotation using public databases (2). Researchers have developed analysis tools for each process and often publish them as opensource software (3). To avoid the need to execute these tools manually, researchers usually write a script to combine them into what is called a workflow (4).
To build and maintain a complex workflow that combines many tools efficiently (5), many workflow systems have been developed (6). Some of these systems have large user communities, such as Galaxy (7), the Common Workflow Language (CWL) (8), the Workflow Description Language (WDL) (9), Nextflow (10), and Snakemake (11). Although each system has its unique characteristics, they have a common aim: to make computational methods portable, maintainable, reproducible, and shareable (4). Most systems have a syntax for describing a workflow that is part of what is called a workflow language. They also have an execution system that works with computational frameworks, such as a job scheduler and container virtualization (12). With the popularization of workflow systems, many research communities have worked on workflow sharing in the form of a workflow language. Workflow registries, such as Work-flowHub (13), Dockstore (14), and nf-core (15), have been developed as public repositories for the sharing of workflows. Workflow execution systems also utilize these registries as their tool libraries. To improve the interoperability of workflow registries, the Global Alliance for Genomics Health (GA4GH) proposed the Tool Registry Service (TRS) specification that provides a standard protocol for sharing workflows (16,17). Sharing workflows not only increases the transparency of research but also helps researchers by facilitating the reuse of programs, thereby making data analysis procedures more efficient. However, workflows that are accessible on the internet are not always straightforward for others to use. If a published workflow is not appropriately licensed, researchers cannot use it because the permission for secondary use is unclear. A workflow may also not be executable because its format is incorrect, or dependent files cannot be found. Even if a workflow can be executed, the correctness of its operation often cannot be verified because no tests have been provided. Furthermore, the contact details of the person responsible for the published workflow are not always attached to it. It is noteworthy that these issues in reusing public workflows are not often obvious to workflow developers. To clarify the requirements for workflow sharing, Goble et al. have proposed the concept of a FAIR (findable, accessible, interoperable, and reusable) workflow (18). This inheritance of the FAIR principles (19) focuses on the structure, forms, ver-D R A F T sioning, executability, and reuse of workflows. However, the question remains as to who should guarantee to users that published workflows can be reused following the FAIR workflow guidelines. WorkflowHub asks submitters to take responsibility for workflows: when a workflow is registered on WorkflowHub, the license and author identity should be clearly stated, encouraging them to publish FAIR workflows. However, there is no obligation as to the correctness of the workflow syntax, its executability, or testing. Not placing too many responsibilities on workflow submitters keeps obstacles to submission low, which will likely increase the diversity of public workflows on WorkflowHub; however, it will also likely increase the number of one-off submissions, which one can assume are at higher risk for the workflow problems previously described. Unlike WorkflowHub, in nf-core, the community that operates the registry holds more accountability for published workflows. Workflow submitters are required to join the nfcore community, develop workflows according to their guidelines, and prepare them for testing. These requirements enable nf-core to collect workflows with remarkable reliability. However, the community's effort tends to focus on maintaining more generic workflows that have a large number of potential users. Consequently, nf-core states that it does not accept submissions of bespoke workflows. This is an understandable policy, as maintaining a workflow requires domain knowledge of its content, and this is difficult to maintain in the absence of the person who developed the workflow. In order to improve research efficiency through workflow sharing, research communities need the publication of diverse workflows in a reusable form. However, as shown by existing workflow registries, there is a trade-off between publishing a wide variety of workflows and maintaining the reusability of the workflows that are published. Solving this issue requires reducing the cost to developers in making and keeping their workflows reusable, which currently relies on manual effort. This is achievable by redefining the FAIR workflow concept as a set of technical requirements and providing a system that automates their validation and testing. We introduce Yevis, a system to share workflows with automated metadata validation and test execution. Through the development of Yevis, we specified a set of technical requirements that define a reusable workflow, according to the FAIR workflow concept. Yevis helps researchers and communities share workflows that satisfy the requirements by supporting a build of an independent workflow registry. To allow workflow hosting without the need of additional dedicated computing resources, Yevis works on two public web services: GitHub, a source code sharing and software development service, and Zenodo, a public research data repository. In addition, a Yevis registry provides a web-based workflow browser and the GA4GH TRS-compatible API ensures interoperability with other existing workflow registries. Yevis is particularly powerful when individuals or communities want to share workflows but are without the technical expertise to build and maintain a web application. To demonstrate that workflows can be shared that fulfill the defined requirements using Yevis, we built a registry for workflows that an existing community has managed.

Implementations
System design. First, we defined a set of requirements that the Yevis system can automatically verify or test (Table 1). By satisfying these requirements inspired by FAIR workflow, we consider a workflow is "reusable with confidence." These criteria have three aspects: workflow availability, accessibility, and traceability. To help researchers share reusable workflows, we took an approach to aid them in building their own workflow registry that automatically ensures its reusability. We define a workflow registry as a service that serves workflow information via the GA4GH TRS protocol. The information provided by the TRS API is various workflow metadata, such as author information, documentation, language type and version, dependent materials, testing materials, etc. Large files, such as dependent materials and testing materials, are not directly included in the TRS API response but are described as remote locations, such as HTTP protocol URLs. Therefore, the entities that a workflow registry collects are a set of workflow metadata described in the form of the TRS API response. In this study, therefore, we designed the system as an API server that delivers the TRS API response. In the Yevis registry, a workflow-sharing procedure is divided into three processes: submission, review, and publication ( Figure 1). To address the requirements listed in Table 1, the Yevis system automatically performs processes, such as metadata validation, workflow testing, test provenance generation, persisting associated files, DOI assignment, and TRS response deployment. To generate the TRS API response and publish it while addressing the requirements listed in Table 1, we implemented a command-line application called Yevis-cli. This application contains various utilities to support the workflow registration procedure including validation and testing. As a service and infrastructure to perform these steps, we designed Yevis to use the services of GitHub and Zenodo. Using these web services makes it possible for communities to build a workflow registry without the need of maintaining their own computer servers.

Workflow registration with automated validation and testing.
To set up a Yevis registry, registry maintainers need to do an initial configuration of GitHub and Zenodo; this involves, for example, creating a GitHub repository, changing repository settings, and setting up security credentials. The online documentation "Yevis: Getting Started" shows the step-by-step procedures to deploy a workflow registry and test it (20). We defined the Yevis metadata file, a JSON or YAML format file that contains structured workflow metadata (Figure 2). Yevis-cli uses this file as its input and output in the submission process. The Yevis metadata file will be published on the registry along with the TRS response to provide metadata that D R A F T Table 1. The requirements for a workflow to be considered reusable with confidence. We classify these requirements from the perspectives of the availability, validity, and traceability of the workflows. We propose that these requirements should be assured and provided to users by the workflow registries.

Perspective Requirement Description
Availability Main workflow description The main workflow description file is available and accessible without restriction.

Dependent materials
The dependencies of the main workflow are available, e.g., definitions of dependent workflows and tools.

Testing materials
The job configuration files for testing are available, e.g., parameter and input files. Open-source license The workflow and the related materials are published under an appropriate open-source license.

Validity
Language type The language used to describe the workflow is specified, e.g., CWL, WDL, or Nextflow.

Language version
The version of the workflow language used is specified.

Language syntax
The language syntax of the workflow is valid.

Traceability
Authors and maintainers The contact information of the authors and the maintainers is identified. Documentation The documentation of the workflow is available.

Workflow ID
The unique identifier to specify the workflow is assigned, ideally by a URI. Workflow metadata version The version number of the workflow metadata is specified.  1. The flowchart of the workflow registration to a Yevis repository. The workflow registration procedure is divided into three processes: the submission, review, and publication process. Each process is performed in different locations: in the submitter's local environment, as part of the GitHub pull request, or as the GitHub Actions. The generated TRS API response is deployed to GitHub Pages. The steps indicated by yellow boxes, such as validating metadata, are performed automatically using Yevis-cli.
is not included in the TRS protocol, such as an open-source license.
Submission process. Figure 3 shows the submission process using Yevis-cli. During this submission process, the workflow submitter describes the workflow metadata in their local environment and submits it through a GitHub pull request (i.e., a review request to the registry maintainer). First, Yeviscli generates a template for the Yevis metadata file, which requires the URL of the main workflow description file as an argument. In many workflow systems, the main workflow description file is the entry point for workflow execution. Yevis-cli generates a template supplemented with workflow metadata automatically collected by using the GitHub REST API and inspecting the workflow's contents. Next, the submitter needs to edit the Yevis metadata file template and add workflow tests. Yevis-cli executes a test using a GA4GH Workflow Execution Service (WES) instance, a type of web service also described as workflow as a service (17,21); therefore, the testing materials must be written along with the specification of the WES run request. Yevis-cli performs these tests to check if the workflow execution completes successfully. After preparing the Yevis metadata file, Yevis-cli validates the workflow metadata syntax and runs tests using WES in the submitter's local environment. If no WES endpoint is specified, the tests are run using Sapporo (22), a production-ready implementation of WES, and Docker (23), a container virtualization environment. Using these portable WES environments also ensures the portability of testing in Yevis. Finally, Yevis-cli submits the workflow as a GitHub pull request, once it confirms the required actions: the metadata validation and the test passing. This restriction reduces the burden on the registry maintainer because many of the requirements listed in Table 1 can be ensured during the submission process rather than the review process.  Fig. 2. Example of the Yevis metadata file. The main workflow description, dependent materials, etc. are described as remote locations; the file contains all the information that the Yevis-cli requires to automate the whole process. This is the actual metadata file for the workflow described in the Section "Sharing workflows using Yevis." This file is automatically updated to the version shown in Figure 7 through the processes within Yevis; for example, the file URL field is replaced by the Zenodo record URL that persists in the associated workflow files.  Fig. 3. The timeline of the workflow submission process using Yevis-cli. The submitter executes four subcommands of Yevis-cli: "make-template," "validate," "test," and "pull-request" in its local environment. The submitter needs to edit a template of the Yevis metadata file using any text editor. The workflow and its metadata need to pass validation and testing before their submission, which helps to reduce the burden on the registry maintainer.

D R A F T
Review process. Figure 4 shows the workflow review process using Yevis-cli. During the review process, registry maintainers examine each workflow submitted as a Yevis metadata file on the GitHub pull request UI. Because the submission method is restricted to Yevis-cli, the submitted workflow is guaranteed to pass validation and testing. To ensure the reproducibility of test results on a local computer, Yevis automatically validates and tests it on GitHub Actions, GitHub's continuous integration/continuous delivery (CI/CD) environment (24). After automated validation and testing, the maintainers review the test results and log files to consider whether to approve the pull request. Rather than using a chat tool or a mailing list, the review process through the GitHub pull request improves the transparency and traceability of workflow publication.
Publication process. Figure 5 shows the workflow submission process using Yevis-cli. During the publication process, the system automatically persists all files associated with the workflow. It generates the TRS response from the Yevis metadata file. The approval of the pull request automatically triggers the publication process on GitHub Actions. In the GitHub Actions script, Yevis-cli uses the Zenodo API to create a new Zenodo upload and persists all files related to the workflow (25). It obtains the DOI and persistent URLs of workflows from Zenodo, and appends them to the Yevis metadata file. Following the Zenodo upload, the Yevis-cli in the GitHub action generates a TRS response JSON file and is deployed to GitHub Pages, GitHub's static web page hosting service. Accordingly, the Yevis metadata file is merged to the default branch of the GitHub repository and deployed to GitHub Pages. With these two files, the TRS response JSON file and the Yevis metadata file, a Yevis registry covers the information that fulfills the requirements of a reusable workflow.

Workflow browsing interface.
To make it easier for registry maintainers and users to browse workflows, we implemented Yevis-web, a workflow browsing interface ( Figure 6). As the interface is a browser-based application implemented in JavaScript, registry maintainers can deploy the browser on GitHub Pages. Yevis-web accesses the TRS API served via GitHub Pages and the GitHub REST API to retrieve workflow information. To help organize the submissions to the registry, the browser shows workflows of both statuses, those already published and those still under the review process.

Proof of concept
To demonstrate that a research community can publish the workflows using Yevis while addressing the requirements listed in Table 1, we built a workflow registry that publishes "DAT2-cwl" workflows with the Yevis system (26). These workflows written in CWL are the appendix of the book Next Generation Sequencer DRY Analysis Manual, 2nd Edition (27) and are maintained by the book's authors and communities. These workflows have been maintained by a community of bioinformatics experts; however, they fulfill only a part of the requirements that we defined. For example, the workflows have test data but would require continuous testing. They also lack workflow metadata in a standard format. Among the DAT2-cwl workflows, we selected a bacterial genome analysis workflow in building a new registry with Yevis (28). This workflow combines the following command line tools: SeqKit (29), FastQC (30), fastp (31), and Platanusb (32). Each tool used in the workflow is packaged in a Docker container. First, we described a Yevis metadata file ( Figure 2) for this workflow using Yevis-cli and appended a test of the workflow in the form of a WES run request. We then performed the workflow registration procedure described in the Section "Workflow registration with automated validation and testing" using Yevis-cli that enable the automation of many of the steps in the validation, testing, reviewing, and publishing. Through the publication procedure of the bacteria genome analysis workflow, we evaluated how the Yevis system addressed the requirements listed in Table 1. Requirements classified as "Availability" were addressed by being uploaded to Zenodo under an appropriate open source license (33). The Yevis metadata file (Figure 7) (34) and TRS API response ( Figure 8) were updated through Yevis's publication process to use URLs persisted by Zenodo. Requirements classified as "Validity" were addressed by running tests on GitHub Actions. The contents in the Yevis metadata file and the TRS response satisfy the validity requirements, such as workflow type, workflow language version, and the URL of the test results. Requirements classified as "Traceability" were addressed by describing, reviewing, and publishing them in the Yevis metadata file and TRS API response. From the above, we confirmed that Yevis successfully published the bacteria genome analysis workflow while addressing the defined requirements.

Discussion
Through our survey of existing workflow registries, such as Dockstore, WorkflowHub, and nf-core, it was revealed that they are maintained based on numerous contributions by various communities and the use of sufficient computer resources. While these established workflow registries accept submissions and are available for use by researchers, there are still cases in which there is a need to create a new workflow publication platform. For example, in the case of the Bioinformation and DDBJ center, the institute (hereafter referred to simply as DDBJ) needed to have a collection of workflows that would be allowed to run on the WES on their computing platform. Therefore, we designed Yevis as a tool to help workflow developers create a registry to share their workflows. DDBJ used Yevis to create and then to maintain a workflow registry dedicated to workflows for use on the DDBJ WES (35). Yevis can promote the concept of a distributed workflow registry model that underlies the specifications of the GA4GH Cloud Work Stream (17). The API standard for workflow registry specified by GA4GH enables a decentralized model, which promotes diversity in workflow development and in

Publication Timeline Processes
GitHub Pages TRS API response Zenodo Obtain DOI Upload files Updated Yevis metadata file Deploy Fig. 5. The timeline of the workflow publication process using Yevis-cli. All steps are performed automatically on GitHub Actions. All files related to the workflow are persisted by uploading them to Zenodo. The DOI is generated by Zenodo, and the Yevis metadata file is updated to append the DOI information and the persisted file URL. The GitHub action generates a TRS response from the Yevis metadata; it then deploys both of them to GitHub Pages. the research of analysis methods. Resource sharing, particularly of analysis methods, has a bigger impact on a community studying a minor target with limited human resources. The Yevis system strongly relies on web services, such as GitHub and Zenodo. This is because we aimed to provide support to individuals or communities without sufficient computing resources, but this may result in a lock-in to these web services. Although we believe these services are reliable enough to host valuable workflows, we designed the system to only use substitutable operations of those services, such as version management, file hosting, and continuous script execution. Therefore, we are able to technically build a Yevis registry in an on-premise environment. Compared to existing workflow registries that have a web form for workflow registration, the Yevis system provides only a command-line interface, Yevis-cli, as a method to submit a workflow. This is because we prefer to test workflows locally in advance of submission, while the existing registries test as part of a review process. By using the same test suite on both the submitter's environment (local) and as part of the registry's automatic process (remote), Yevis-cli ensures better reliability of the test results. This also helps to reduce the cost to a registry maintainer by ensuring a workflow is at least runnable on the submitter's local environment.
The Yevis system is a great solution for research communities that aim to share their workflows and wish to establish their own registry as described. However, we recognize it still has some limitations. One of the challenges is the description of D R A F T Fig. 6. Screenshot of Yevis-web. Yevis-web is a browser-based application used via a web browser, which is deployed by workflow registry maintainers and communicates with the TRS API and GitHub REST API to retrieve workflow information. The browser shows both published and under-review workflows to help maintainers in organizing the registry. Upon selecting a workflow of interest, Yevis-web displays more detailed information, such as test results and the contents of the files related to the workflow. name: DAT2-cwl -bacteria genome workflow readme: "https://zenodo.org/api/files/af647e79-1ec9-4947-b7dd-c451faf4a511/README.md" language: type: CWL version: v1.0 files: -url: "https://zenodo.org/api/files/af647e79-1ec9-4947-b7dd-c451faf4a511/bacteria-genome.packed.cwl" target: bacteria-genome.packed.cwl type: primary testing: -id: test files: -url: "https://zenodo.org/api/files/af647e79-1ec9-4947-b7dd-c451faf4a511/test-job-yevis.yml" target: test-job-yevis.yml type: wf_params -url: "https://zenodo.org/api/files/af647e79-1ec9-4947-b7dd-c451faf4a511/DRR024501_1.fastq" target: DRR024501_1.fastq type: other -url: "https://zenodo.org/api/files/af647e79-1ec9-4947-b7dd-c451faf4a511/DRR024501_2.fastq" target: DRR024501_2.fastq type: other Fig. 7. Yevis metadata file for the DAT2-cwl/bacteria-genome workflow. Through Yevis's workflow registration process, this file has been updated from the original shown in Figure 2. The updates include the addition of metadata about Zenodo, a DOI, and the replacement of the URLs of associated files with URLs persisted by Zenodo. workflow testing. Writing tests for a workflow is difficult because the outputs may be heuristic values, may differ among tool versions, and the total amount of input and output data can be enormous. There is a need for a testing framework that packages test cases and results and generates assertions that allow for slight differences in the output. Debugging a workflow is also difficult because a typical workflow uses many tools that use various programming runtimes. There-